Performance of Matrix Computation

Contents

Performance of Matrix Computation#

강좌: 수치해석

Matrix Computation#

많은 수치 계산은 Matrix 또는 Vector 연산임
다양한 선형대수 수치 라이브러리가 존재함
- BLAS
- LAPACK, LINPACK
- PETSc 등

BLAS Library#

기본 행렬/벡터 연산 라이브러리
벡터-벡터 (Level 1), 행렬-벡터 (Level 2), 행렬-행렬 연산 (Level 3)
Intel MKL, NVIDIA CuBLAS 등 하드웨어 제조사에서 최적화된 라이브러리 제공
- OpenBLAS 등의 라이브러리도 존재함
Numpy도 BLAS, LAPACK 등을 활용하여 array 연산 수행함

FLOPS#

Floationg Point Operations per Second
초당 부동소숫점 연산 속도
컴퓨터의 주요 성능 지표 중 하나임
- Top 500 Lists
유일한 성능 지표는 절대 아님!!!

GEMM 연산 속도 측정#

GEMM (General Matrix to Matrix Multiplication)
- 대표적인 연산 속도 측정 방법
%timeit 함수를 이용해서 \(m=l=n=4096\) 인 경우 평균 연산 시간과 FLOPS를 측정하라
- Double precision 과 Single Precision 모두 측정하라
- 사용중인 CPU의 이론 성능과 비교해보자
  - https://en.wikichip.org/wiki/flops
Example
- i7 1360p processor
- Theoetrical peak: 16 DP FLOPS/cylce * 4 cores (P) * 5.0Ghz = 320 GFLOPs

import numpy as np


m=n=l=4096

a = np.random.rand(m, n)
b = np.random.rand(n, l)
c = np.random.rand(m, l)

t = %timeit -o c[:] = a @ b

flops = 2*m*n*l / t.average
print("Measured FLOPS : {:.4f} GFLOPS".format(flops*1e-9))

690 ms ± 17.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Measured FLOPS : 199.2197 GFLOPS